21 research outputs found

    Exploiting Context-Dependent Quality Metadata for Linked Data Source Selection

    Get PDF
    The traditional Web is evolving into the Web of Data which consists of huge collections of structured data over poorly controlled distributed data sources. Live queries are needed to get current information out of this global data space. In live query processing, source selection deserves attention since it allows us to identify the sources which might likely contain the relevant data. The thesis proposes a source selection technique in the context of live query processing on Linked Open Data, which takes into account the context of the request and the quality of data contained in the sources to enhance the relevance (since the context enables a better interpretation of the request) and the quality of the answers (which will be obtained by processing the request on the selected sources). Specifically, the thesis proposes an extension of the QTree indexing structure that had been proposed as a data summary to support source selection based on source content, to take into account quality and contextual information. With reference to a specific case study, the thesis also contributes an approach, relying on the Luzzu framework, to assess the quality of a source with respect to for a given context (according to different quality dimensions). An experimental evaluation of the proposed techniques is also provide

    Interlinking SciGraph and DBpedia Datasets Using Link Discovery and Named Entity Recognition Techniques

    Get PDF
    In recent years we have seen a proliferation of Linked Open Data (LOD) compliant datasets becoming available on the web, leading to an increased number of opportunities for data consumers to build smarter applications which integrate data coming from disparate sources. However, often the integration is not easily achievable since it requires discovering and expressing associations across heterogeneous data sets. The goal of this work is to increase the discoverability and reusability of the scholarly data by integrating them to highly interlinked datasets in the LOD cloud. In order to do so we applied techniques that a) improve the identity resolution across these two sources using Link Discovery for the structured data (i.e. by annotating Springer Nature (SN) SciGraph entities with links to DBpedia entities), and b) enriching SN SciGraph unstructured text content (document abstracts) with links to DBpedia entities using Named Entity Recognition (NER). We published the results of this work using standard vocabularies and provided an interactive exploration tool which presents the discovered links w.r.t. the breadth and depth of the DBpedia classes

    Context Aware Source Selection for Linked Data

    Get PDF
    The traditional Web is evolving into the Web of Data, which gathers huge collections of structured data over distributed, heterogeneous data sources. Live queries are needed to get current information out of this global data space. In live query processing, source selection allows the identification of the sources that most likely contain relevant content. Due to the semantic heterogeneity of the Web of Data, however, it is not always easy to assess relevancy. Context information might help in interpreting the user\u2019s information needs. In this paper, we discuss how context information can be exploited to improve source selection

    LinkedDataOps: linked data operations based on quality process cycle

    Get PDF
    This paper describes three new Geospatial Linked Data (GLD) quality metrics that help evaluate conformance to standards. Standards conformance is a key quality criteria, for example for FAIR data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets that showed a wide variation in standards conformance. This is the first set of Linked Data quality metrics developed specifically for GLD

    Distribution and occurrence of microsporidian pathogens of the willow flea beetle, Crepidodera aurata (Coleoptera: Chrysomelidae) in North Turkey

    Get PDF
    In this study, microsporidian pathogens in Crepidodera aurata populations were investigated. Totally 1,728 C. aurata adults were examined for microsporidian pathogens and 78 of them were found to be infected. Two species of microsporidia; Microsporidium sp.1 and Microsporidium sp.2 were observed in the C. aurata populations from ten localities in North Turkey. They show considerable difference from each other in the spore morphology and dimension, infection rate and host locality. The spores of Microsporidium sp.1 were oval in shape and measured from 3.66 to 5.66 µm in length and from 1.35 to 2.22 µm in width (n=50). The spores of Microsporidium sp. 2 were slightly curled and measured from 2.44 to 3.55 µm in length and from 1.25 to 1.55 µm in width (n=50). These microsporidia were recorded from C. aurata for the first time. Here we present occurrence and distribution of two microsporidia in C. aurata populations as natural potentially suppressing factors

    Quality metrics to measure the standards conformance of geospatial linked data

    Get PDF
    This paper describes three new Geospatial Linked Data (GLD) quality metrics that help evaluate conformance to standards. Standards conformance is a key quality criteria, for example for FAIR data. The metrics were implemented in the open source Luzzu quality assessment framework and used to evaluate four public geospatial datasets that showed a wide variation in standards conformance. This is the first set of Linked Data quality metrics developed specifically for GL

    A SKOS taxonomy of the UN global geospatial information management data theme

    Get PDF
    Complex data domains increase the difficulty of structuring, sharing, discovering and governing information. For the geospatial domain common models such as INSPIRE have been established in the European Union. The United Nations initiative on Global Geospatial Information Management (UN-GGIM) draws together national and regional capacities. Interoperability is the main principle behind these initiatives. Nonetheless there is a lack of published research to date on mapping agency geospatial linked data leveraging the UN-GGIM taxonomy of information management data themes. Thus, we have identified use cases and defined a Simple Knowledge Organization System (SKOS)\footnote{\url{https://www.w3.org/TR/skos-reference/}} taxonomy expressing the UN GGIM data themes for national spatial infrastructure. This has been applied in a metadata generation and reporting tool for Ordnance Survey Ireland (OSi) which underpinned improved governance and reporting infrastructure in OSi. This demonstrated the contribution of Semantic Web technology to spatial data governance as well as its importance for data publishing. This paper presents a documented open license SKOS taxonomy for the UN GGIM data themes that follows Linked Data best practices. It provides a set of three use cases, an overview of UN-GGIM theme definitions and an example application of the taxonomy for deployment in OSi for DCAT metadata generation and data publishing pipeline reporting

    Data quality and patient characteristics in European ANCA-associated vasculitis registries: data retrieval by federated querying

    Get PDF
    Objectives This study aims to describe the data structure and harmonisation process, explore data quality and define characteristics, treatment, and outcomes of patients across six federated antineutrophil cytoplasmic antibody-associated vasculitis (AAV) registries.Methods Through creation of the vasculitis-specific Findable, Accessible, Interoperable, Reusable, VASCulitis ontology, we harmonised the registries and enabled semantic interoperability. We assessed data quality across the domains of uniqueness, consistency, completeness and correctness. Aggregated data were retrieved using the semantic query language SPARQL Protocol and Resource Description Framework Query Language (SPARQL) and outcome rates were assessed through random effects meta-analysis.Results A total of 5282 cases of AAV were identified. Uniqueness and data-type consistency were 100% across all assessed variables. Completeness and correctness varied from 49%–100% to 60%–100%, respectively. There were 2754 (52.1%) cases classified as granulomatosis with polyangiitis (GPA), 1580 (29.9%) as microscopic polyangiitis and 937 (17.7%) as eosinophilic GPA. The pattern of organ involvement included: lung in 3281 (65.1%), ear-nose-throat in 2860 (56.7%) and kidney in 2534 (50.2%). Intravenous cyclophosphamide was used as remission induction therapy in 982 (50.7%), rituximab in 505 (17.7%) and pulsed intravenous glucocorticoid use was highly variable (11%–91%). Overall mortality and incidence rates of end-stage kidney disease were 28.8 (95% CI 19.7 to 42.2) and 24.8 (95% CI 19.7 to 31.1) per 1000 patient-years, respectively.Conclusions In the largest reported AAV cohort-study, we federated patient registries using semantic web technologies and highlighted concerns about data quality. The comparison of patient characteristics, treatment and outcomes was hampered by heterogeneous recruitment settings
    corecore